ISA & ICA - Two Web Interfaces for Interactive Alignment of Bitexts alignment of parallel texts
نویسنده
چکیده
ISA and ICA are two web interfaces for interactive alignment of parallel texts. ISA provides an interface for automatic and manual sentence alignment. It includes cognate filters and uses structural markup to improve automatic alignment and provides intuitive tools for editing them. Alignment results can be saved to disk or sent via e-mail. ICA provides an interface to the clue aligner from the Uplug toolbox. It allows one to set various parameters and visualizes alignment results in a two-dimensional matrix. Word alignments can be edited and saved to disk.
منابع مشابه
Bitext Maps and Alignment via Pattern Recognition
Texts that are available in two languages (bitexts) are becoming more and more plentiful, both in private data warehouses and on publicly accessible sites on the World Wide Web. As with other kinds of data, the value ofbitexts largely depends on the efficacy of the available data mining tools. The first step in extracting useful information from bitexts is to find corresponding words and~or tex...
متن کاملImproving English-Russian sentence alignment through POS tagging and Damerau-Levenshtein distance
The present paper introduces approach to improve English-Russian sentence alignment, based on POS-tagging of automatically aligned (by HunAlign) source and target texts. The initial hypothesis is tested on a corpus of bitexts. Sequences of POS tags for each sentence (exactly, nouns, adjectives, verbs and pronouns) are processed as “words” and DamerauLevenshtein distance between them is computed...
متن کاملBitextor, a free/open-source software to harvest translation memories from multilingual websites
Bitextor is a free/open-source application for harvesting translation memories from multilingual websites. It downloads all the HTML files in a website, preprocesses them into a coherent format and, finally, applies a set of heuristics to select pairs of files which are candidates to contain the same text in two different languages (bitexts). From these parallel texts, translation memories are ...
متن کاملPreparation and exploitation of bilingual texts
A bitext is a merged document composed of two versions of a given text, usually in two different languages. An aligned bitext is produced by an alignment tool or aligner, that automatically aligns or matches the versions of the same text, generally sentence by sentence. A multilingual aligned corpus or collection of aligned bitexts, when consulted with a search tool, can be extremely useful for...
متن کاملInteractive Word Alignment for Language Engineering
In this paper we report ongoing work on developing an interactive word alignment environment that will assist a user to quickly produce accurate full-coverage word alignment in bitexts for different language engineering tasks, such as MT lexicons and gold standards for evaluation. The system uses a graphical interface, static and dynamic resources as well as machine learning techniques. We also...
متن کامل